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SPEDEN is a computer program that reconstructs the electron density of single 
particles from their x-ray diffraction patterns, using a single-particle adaptation of 
the Holographic Method in crystallography. (Szoke, A., Szoke, H, and Somoza, 
J.R., 1997. Acta Cryst. A53, 291-313.) The method, like its parent, is unique that 
it does not rely on "back" transformation from the diffraction pattern into real 
space and on interpolation within measured data. It is designed to deal success- 
fully with sparse, irregular, incomplete and noisy data. It is also designed to use 
prior information for ensuring sensible results and for reliable convergence. This 
article describes the theoretical basis for the reconstruction algorithm, its imple- 
mentation and quantitative results of tests on synthetic and experimentally ob- 
tained data. The program could be used for determining the structure of radiation 
tolerant samples and, eventually, of large biological molecular structures without 
the need for crystallization. 



1. Introduction 



>■ This paper describes our computer program SPEDEN that re- 
constructs the density from the diffraction patterns of individ- 



On 



ual particles. SPEDEN is of interest for three reasons. Diffrac- 



tive imaging promises to improve the resolution, sensitivity, 
(~~) and practical wavelength range in X-ray microscopy, for three- 
T sj" dimensional objects that are tolerant to X-rays. A few exam- 
pies are defects in semiconductor structures, phase separation 
c/5 in alloys, nano-scale machines and laser fusion targets. A long- 
. 'term vision is the possibility of high-resolution reconstruction 
^ of diffraction patterns of single bio-molecules. Of broad theo- 
J^ retical interest is Speden's unique approach to the reconstruc- 
ts | tion of scatterers - a difficult mathematical problem. In the rest 
^ of this section we expand on these three topics. 
• , Reconstruction of the electron density from non-uniformly 
sampled, three-dimensional diffraction patterns is of wide in- 
terest and applicability with present-day sources. In radiation- 



tolerant samples, x-ray diffraction and diffraction tomography 
are capable of higher resolution than (straight- or cone-beam) 
tomography alone. In tomography, resolution is limited by the 
quality of the incident beam and by the spatial resolution of 
the detector; in diffraction the resolution can be as fine as the 
wavelength of the incident radiation. Experimentally, diffrac- 
tion imaging has already produced X-ray images at higher res- 
olution than possible with available X-ray optics (Miao et al., 
1999, and He et al., 2002). The price to be paid for these ben- 
efits is the intrinsic difficulty of the reconstruction. Neverthe- 
less, several successful reconstructions from experimental X- 
ray data, using the iterative hybrid input-output version of the 
Gerchberg-Saxton-Fienup (GSF) algorithm, have been reported 
recently (Miao et al., 2002, and Marchesini et al., 2003). The 
first successful application of this algorithm to electron diffrac- 



tion data was reported in 2001 (Weierstall et al., 2002), and 
it has been used more recently to produce the first atomic- 
resolution image of a single carbon nanotube (Zuo et al., 2003). 
In biology, the use of the GSF has recently been shown to dra- 
matically reduce the number of images needed for tomographic 
cryoelectronmicroscopy of protein monolayer crystals, so that 
phasing can be based mainly on the three-dimensional diffrac- 
tion data (Spence et al., 2003). 

The development of SPEDEN was also prompted by the 
promise of new ways to image bio-molecules. Free-electron 
lasers can, in principle, provide X-ray pulses of tens to hundreds 
of femtoseconds in length and brightness up to ten orders of 
magnitude greater than synchrotron radiation. It was predicted 
that, under such circumstances, it should be possible to dispense 
with crystals and reconstruct the electron density of single bi- 
ological particles from their diffraction pattern (Neutze et al., 
2000). In proposed experiments, a large number of single parti- 
cles will be injected into the X-ray beam in random orientation 
and their diffraction patterns will be recorded, each in a single 
shot of the free-electron laser. Such diffraction patterns will be 
very noisy and their resolution will be limited by the signal to 
noise (S/N) ratio. The measured diffraction patterns that corre- 
spond to different orientations of the particle will be classified 
into a number of mutually exclusive classes. The images within 
each class will then be averaged and the class averages assem- 
bled into a three-dimensional diffraction pattern by finding their 
mutual orientation relationships. Finally, the three-dimensional 
diffraction pattern will be reconstructed to yield the electron 
density of the molecule. 

We have worked on the analysis of all three steps of such an 
experiment. The essence of the first analysis is that the maxi- 
mum X-ray intensity at a given pulse length is limited by the 
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requirement that the molecule stay intact during the pulse, even 
though it eventually disintegrates (Hau-Riege et al., 2003). The 
second analysis discusses the division of noisy diffraction pat- 
terns into a number of distinct classes. If the images are divided 
into too few classes, the available resolution is not realized. If 
the patterns are divided into too many classes, the class averages 
will be poor and the pattern quality suffers. The individual class 
averages, each corresponding to a well-defined orientation of 
the particle, will be assembled into a three-dimensional diffrac- 
tion pattern. The result will be a three-dimensional diffraction 
pattern that is measured at a limited number of orientations. It 
will be, therefore, sparse, irregular and will have a limited signal 
to noise ratio (Huldt et al., 2003). 

The program Speden, described in this paper, provides a 
way to optimally determine the electron density from such a 
three-dimensional ensemble of continuous diffraction patterns. 
We first give an analysis of their properties and discuss the 
methods and the expected difficulties of reconstructing a "sen- 
sible" electron density from them. We then describe how SPE- 
DEN adapts the Holographic Method (Szoke et al., 1997a) in 
crystallography to deal with continuous diffraction patterns as 
opposed to discrete Bragg spots; this will be discussed in the 
next section. We then report quantitative results of preliminary 
tests for verifying the correctness of our method. These tests 
use computed and measured diffraction patterns from samples 
of inorganic particles. 

2. SPEDEN: The Program 

2.1. Theoretical Considerations 

2.1.1. Mathematical background The reconstruction of the 
density of scatterers from its diffraction pattern is an "inverse 
problem". Other, well-studied inverse problems are those of 
computed tomography, image deblurring, phase recovery in as- 
tronomy, and crystallography. In tomography, for example, an 
inversion algorithm (e.g. filtered back projection) is used to re- 
cover the density of scatterers from the measured tomograms. It 
is widely recognized that the reconstructed density is very sen- 
sitive to inaccuracies in the measurement. Small errors in the 
diffraction pattern cause large errors in the reconstruction. This 
property is called ill-posedness or ill conditioning. 

The reconstruction of the electron density from x-ray diffrac- 
tion patterns is indeed ill conditioned. It also has two additional 
difficulties. First, in contrast to tomography, there are no direct 
inversion algorithms - not even approximate ones. Second, the 
reconstructed electron density at any sample point is influenced 
strongly by the electron densities of all sample points, as op- 
posed to a limited number of them. Therefore, errors in density 
are non-local and "propagate" far. 

Fortunately, very good fundamental discussions of these sub- 
jects are provided in the books of Daubechies (Daubechies, 
1992), Bertero and Boccacci (Bertero et al., 1998, Bertero, 
1989) and Natterer (Natterer, 1996, Natterer and Wubbeling, 
2001). In somewhat simplified terms, the reconstruction of 
the electron density is similar to finding the inverse of an ill- 
conditioned non-square matrix, a subject thoroughly discussed 



in Golub and van Loan (Golub et al., 1996). We consider these 
mathematical properties to be essential for understanding the 
successes and limitations of reconstruction algorithms; we will 
try to be fully cognizant of them in the discussion that follows. 

2.2. The phase problem of crystallography, oversampling 

The crystallographic phase problem is a good starting point 
for further discussion. It was first realized by Sayre (Sayre, 
1952) that the number of observable complex structure factors, 
limited by the Bragg condition, is equivalent to a critical sam- 
pling of the electron density in the unit cell of the crystal. The 
sampling theorem of Whittaker and Shannon teaches us that, 
if the amplitudes and phases of all the diffraction peaks were 
accurately measured, the electron density could, in principle, 
be reconstructed everywhere (Bricogne, 1992). Unfortunately, 
only the amplitudes of the Bragg reflections are measured, not 
their phases. Therefore there is not enough information in the 
diffraction pattern for a unique reconstruction of the electron 
density. Sayre proposed (Sayre, 1980) that if we could mea- 
sure the diffraction amplitudes "in between" the Bragg peaks, 
we should have enough information to reconstruct the electron 
density, or to "phase" the diffraction pattern. This is exactly the 
situation in diffraction from a single particle. 

Nevertheless, it is still difficult to reconstruct the electron 
density, even from a well "oversampled" diffraction pattern. 
One corollary of critical sampling is that the amplitudes and 
phases of the Bragg reflections of a crystal are independent of 
one other, but any structure factor in between them depends on 
the surrounding ones to some extent. Therefore, too much over- 
sampling does not help to obtain independent data, although it 
does improve the S/N ratio by reducing the noise. Ill posedness 
is still with us, although with oversampling, the error propagates 
less. An additional difficulty with diffraction patterns from a set 
of discrete orientations of a particle is that at low resolution the 
diffraction pattern is well oversampled while at high resolution 
the sampling is sparse. A fundamental property of diffraction 
is that the position and the handedness of the electron density 
are undetermined, resulting sometimes in stagnation of the al- 
gorithm and drift in the position of the results (Stark, 1987). 

There are two well-known, necessary remedies for the lack of 
information and for the ill posedness of the reconstruction prob- 
lem. The more important one is the need for more information. 
For example, one way to include a priori knowledge is to accept 
reconstructed electron densities only if they are "reasonable". 
The second remedy is to use "stabilized" or pseudo-inversion 
algorithms. In the next section we introduce our version of a 
real space reconstruction algorithm; we will argue that our al- 
gorithm deals with all these problems optimally, at least in some 
sense. We return to the comparison of our algorithm with other 
methods for phase-recovery in Section 2.3. 

subsectionSpeden, a real-space algorithm 

In this section we outline the workings of our reconstruction 
program, Speden. Speden uses a real-space method for recon- 
struction; its acronym stands for Single Particle Electron DEN- 
sity. For computational efficiency the particle to be recovered 
is put into a fictitious unit cell that is several times larger than 
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the particle itself. All reconstruction algorithms use this artifice 
in order to be able to calculate structure factors by fast Fourier 
transform techniques. The resulting similarity with crystallogra- 
phy enables the use of many crystallographic concepts. In fact, 
the recognition of this similarity enabled us to write a program, 
Speden, based on our crystallographic program, Eden, with 
relatively small modifications. 

The most significant difference between the two programs 
is that in crystallography the Bragg condition restricts the re- 
ciprocal lattice vectors to integer values, while the continuous 
diffraction pattern can be - and usually is - measured at arbitrary, 
non-integer values of the reciprocal lattice vectors. 

In Speden, in common with Eden, the (unknown) electron 
density is represented by a set of Gaussian basis functions, with 
unknown amplitudes, that fill the fictitious unit cell uniformly. 
This way the recovery is reduced to the solution of a large set 
of quadratic equations. The program "solves" these equations 
by finding the number of electrons in each basis function so 
as to agree optimally with the measured diffraction intensities 
as well as with other "prior knowledge". Prior knowledge in- 
cludes the emptiness of the unit cell outside the molecule, the 
positivity of the electron density, possibly some low-resolution 
image of the object, etc. Each one of those conditions is de- 
scribed by a cost function that measures the deviation of the cal- 
culated data from the observed data. One of the cost functions 
describes the deviation of the calculated diffraction pattern from 
the measured one; others depend on the deviation of the recov- 
ered density from prior knowledge. Measured data are weighted 
by their certainty (inverse uncertainty), other prior knowledge is 
weighted by its "reliability". The mathematical method used is 
(constrained) conjugate gradient optimization of the sum of cost 
functions. At each step of the optimization, there is a set of am- 
plitudes available that describe the current electron density in 
the full unit cell. A full set of structure factors is calculated by 
Fourier transforming the current electron density. When the unit 
cell is larger than the particle, the structure factors can be sta- 
bly interpolated to compare them with measured structure factor 
amplitudes. 

We refer to the cited literature that shows that the procedure 
we outlined is equivalent to a stabilized (quasi) solution of the 
inverse problem (Daubechies, 1992; Bertero et al., 1998; and 
Natterer et al., 2001). As such, it is optimally suited for sparse, 
irregular, incomplete and noisy data. 

In the following subsections we describe very briefly the 
common features of Eden and Speden as well as their differ- 
ences. A more complete description of Eden can be found in 
previous papers (Szoke et al., 1997a, and Szoke, 1998). 

2.2.1. Representation of the electron density The electron 
density is represented as a sum of basis functions, adapted to 
the resolution of the data. Specifically, we take little Gaussian 
"blobs" of width comparable to the resolution, and put their cen- 
ters on a regular grid that fills the "unit cell" and whose grid 
spacing is comparable to the resolution. The amplitudes of the 
Gaussians are proportional to the local electron density. In fact, 
the number of electrons contained in each Gaussian constitutes 



the set of our basic unknowns. The above is identical to the rep- 
resentation of the electron density in Eden. 

Some mathematical details follow. The actual formula for the 
representation of the electron density as a sum of Gaussians is 



^unknown (l*) 



(TrryAr 2 ) 3 ' 2 



exp 



\r-r(p)\ : 
rjAr 2 



■ (1) 



The centers of the Gaussians are positioned at grid points, 
r(p), where p is a counting index. In our fictitious unit cell, 
the grid is orthogonal, the grid spacing is Ar and the centers of 
the Gaussians are usually on two intercalating (body centered) 
grids for best representation of the electron density. The num- 
ber i] of the order unity, determines the width of the Gaussians 
relative to their spacing, Ar. Finally and most importantly, n{p) 
is the unknown number of electrons in the vicinity of the grid 
point r(p). The values of n{p) are real, and in future may also 
be complex-valued to allow for photo-absorption in addition to 
scattering. (The latter can be significant when diffraction mea- 
surements are made at longer X-ray wavelengths.) 

Given n(p) the structure factors can be calculated by 

p 

F ca ic{h) = e ->(H^H) 2 n(p) exp [2nih ■ Tr{p)] , (2) 
P =i 

using a fast Fourier transform. The vector h, a triplet of inte- 
gers, denotes the reciprocal lattice vector, the operator F trans- 
forms from real space (Cartesian) coordinates to fractional co- 
ordinates, and T r denotes the dual transformation. 

The constants appearing in Eqs. (1,2) were discussed in some 
detail previously (Szoke et al., 1997b). For completeness, we 
define them here. The crystallographic B factor is given by 
B = (2irAr) 2 1]. The "crystallographic resolution", d, deter- 
mines the grid spacing, Ar by the relation, Ar « /spaced, where 
/space is a constant of the order unity. For a body-centered lattice 
we set / S p aC e = 0.7 and i] = 0.6. For a simple lattice, we use / spa ce 
= 0.6 and rj = 0.8. 

Note that the Gaussian basis functions are not used in a one- 
to-one correspondence with single atoms, but are simply used 
to describe the 3D electron density at the resolution that is ap- 
propriate to the data resolution. In the special case that the reso- 
lution was about the size of an atom and an atom happened to be 
sitting exactly on a grid point, that atom would be represented 
by a single basis function. If the atom is not on a grid point, 
or if the atom happens to be fat, because of thermal motion, 
that same atom would be represented by many basis functions. 
Similarly, at lower resolution, a single basis function represents 
assemblies of atoms. 

2.2.2. Reciprocal space knowledge The measured diffrac- 
tion pattern of the molecule is proportional to the absolute 
square of the structure factors. In Speden we do account for 
the curvature of the Ewald sphere. There are two subtle points: 
the diffraction pattern is measured only in a finite number of di- 
rections, H(i); and as a rule, those directions are not along the 
reciprocal lattice vectors of the (fictitious) unit cell for a single 
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particle. In other words, the measurement directions, H (z), are 
usually not integers and they are not uniformly distributed in 
reciprocal space. This is the main difference between crystal- 
lography and single particle diffraction and, therefore, between 
EDEN and Speden. The essence of any reconstruction algo- 
rithm is to try to find an electron density distribution such that 
the calculated diffraction pattern matches the observed one. In 
our representation, we try to find a set of n(p), such that 



|F obs (//(/))| 2 = | J F calc (//(0)| 2 



(3) 



for each measurement direction, H{i). Let us assume for a mo- 
ment that H(i) are integers. When the representation of the un- 
known density is substituted from (2), for each measured value 
of H(i), equation (3) becomes a quadratic equation in the un- 
knowns, n(p). The number of equations is the number of mea- 
sured diffraction intensities. It is usually not equal to the number 
of independent unknowns that are the number of grid points in 
the unit cell. The equations usually contain inconsistent infor- 
mation, due to experimental errors. The equations are also ill 
conditioned and therefore their solutions are extremely sensi- 
tive to noise in the data. Under these conditions the equations 
may have many solutions or, more usually, no solution at all. 
Our way of circumventing these problems is to obtain a "quasi 
solution" of (3) by minimizing the discrepancy, or cost function 
(see e.g. Stark, 1987, and Bertero et al., 1998) 

XSpeden = ]T w'(H(i)) 2 [\F obs (H - \F calc {H(i)) |] 2 (4) 

i 

The weights, w'(H(i)) 2 , are usually set to be proportional to 
the inverse square of the uncertainty of the measured structure 
factors, l/er 2 (//(;')). As discussed by Szoke (Szoke, 1999), this 
is equivalent to a maximum likelihood solution of the equations. 

Let us now discuss the first, previously ignored difficulty in 
the reconstruction. When we try to reconstruct the electron den- 
sity from real experimental data, we have to compare the set of 
measured \F \, s (H(i))\, where H(i) are not necessarily integers, 
with the calculated structure factor amplitudes, \F ca i c (h)\, that 
are on a regular grid, i.e. have integer h. In principle, given 
an electron density of the molecule, one could calculate the 
structure factors in the experimental directions. Nevertheless, 
for computational efficiency, we put the (unknown) molecule or 
particle into a fictitious unit cell that is larger than the molecule. 
We will also demand that the Gaussians outside the molecular 
envelope be empty. (In practice, sizes of molecules are known 
from their composition; particle sizes and shapes may be known 
from lower-resolution imaging.) As long as the distances of the 
Gaussian basis functions are kept to be the experimental resolu- 
tion, the number of "independent" unknowns neither increases 
nor decreases, in principle, by this computational device. The 
structure factors are calculated on an integer grid in the large 
unit cell, so they are essentially oversampled in each dimension 
by the same factor of the size of the large cell to the size of the 
molecule. The oversampling allows stable interpolation of the 
calculated structure factor amplitudes from integer h to the frac- 
tional H(i) everywhere, independent of the density of the actual 



measurements. Note that interpolation from fractional H(i) to 
integer h is not always a stable procedure! 

In the present implementation of Speden, we get sufficient 
accuracy with the simplest, tri-linear interpolation in the ampli- 
tudes of \F ca i c (h) \ if we choose the fictitious unit cell to be three 
times larger than the molecule in each dimension. Now, some 
mathematical details: the reciprocal space vector H(/) is within 
a cube, bounded by eight corners h(i,j), ,{j= 1, 8} with inte- 
ger values. Let us denote the fractional parts of the components 
of H(i) as (H,K,L). We define weights for the eight corners, 
w(i, j), by taking the products of the fractional parts of H, or 
(1 - H) with those of K or (1 - K) and L or (1 - L). The cost 
function to be minimized now becomes 



XSpeden = ^V(H(/)) 2 - 

i 

8 

\F obs (K(i))\ - E *(''' jMaiMh j))\ (5) 



A similar approach of applying crystallographic algorithms 
to continuous diffraction data has been done with direct meth- 
ods [Spence et al., 2003]. In this case, however, the Ewald 
sphere was approximated by a plane. 

2.2.3. Real space knowledge (targets) Let us assume that 
we have some, possibly uncertain, knowledge of the electron 
density in parts of the unit cell from an independent source, i.e. 
one that does not come from the X-ray measurement itself. This 
is the kind of knowledge present when the unknown molecule 
is placed into a larger unit cell and we demand that the unit cell 
be empty outside the molecule. This kind of knowledge was 
also referred to as a "sensible" electron density in the introduc- 
tion. We represent this knowledge by a target electron density 
«target(i?) and by a real-space weight function w'(p) 2 . It will be 
desirable that the actual electron density of the molecule, p(r), 
as represented by n(p), be close to the target electron density; 
the weight function w'(p) 2 expresses the strength of our belief 
in the suggested value of the electron density. Note that target 
densities can be assigned in any region of the unit cell indepen- 
dently of those in any other region. The simplest way to express 
the above statement mathematically is to minimize the value of 
the cost function 



Xspace = Xspace 51 W ' l n (p) ~ "target (/>)] 1 
p=l 



(6) 



where X space is a scale factor and A is a normalizing constant, 
described in Somoza et al., 1995. (In the absence of information 
at and around the molecule, weights are generally unity where 
it is known that there is no molecule and zero elsewhere.) The 
same procedure is used in Eden. 
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2.2.4. Low resolution target (phase extension) The knowl- 
edge of the electron density at low resolution can be expressed 
by a low-resolution spatial target. Crystallographers call this 
phase extension. The essence is that, during the process of the 
search for an optimal electron density, we try to keep its low 
resolution component as close to the known density as possible. 
A convenient way to accomplish this is the following. Given 
n(p), we smear out its Gaussian representation and compare it 
to the equally smeared out target. Actually, it is easier to carry 
out the computation in reciprocal space. We define 

Xphasext = Aphasext W* (K)" ' IfsmearO) ~ F tal (h)\ , (7) 

i 

where the current "smeared" structure factors are calculated us- 
ing the low resolution, AR, 

p 

^smear(h) = exp [-r)(nAR \T t \i\ ) 2 ] ^ n(p) exp [2mh • Fr{p)\. 

p=i 

(8) 

The original low-resolution target is prepared analogously 
from the (presumably) known electron density, «tar(p). The 
same procedure is used in Eden. 

2.2.5. Minimization of the cost function In the presence of 
a target density, the actual cost function used in the computer 

program is the Sum of XSpeden (5), Xspace (6) and Xphasext (7) 

Xtotal — XSpeden H - Xspace Xphasext ■ (9) 

The fast algorithm described in Somoza et al., 1995, and 
Szoke et al., 1997b, is always applicable to the calculation of 
the full cost function, (9). There is a clear possibility of defin- 
ing more target functions. They are all added together to form 
Xtotal that is minimized to find the optimum electron density. 

The minimization of the cost function (9) is carried out in 
Speden (as in Eden) by D. Goodman's conjugate gradient al- 
gorithm (Goodman, 1991). It has proven to be very robust and 
efficient in years of use in EDEN. The essential properties of the 
algorithm that make it so advantageous for our application is 
that the positivity of the electron density, n(p) > 0, is always 
enforced and that the gradient vector in real space can be calcu- 
lated by fast Fourier transform. The gradient calculation needed 
only a very simple modification for the interpolation in recip- 
rocal space, Eq. (5). The line search algorithm does not use the 
Hessian, so matrices are never calculated. 

As with any local minimization, global convergence is not 
achieved. We discussed this problem in our previous papers and 
came to the conclusion that, usually, the minimum surface of 
the cost function (7) is so complicated that finding a global min- 
imum would take more computer time than the existence of the 
universe. 

2.3. Comparison to iterative algorithms 

Reconstruction of the scatterer from a continuous diffraction 
pattern has a tangled history replete with repeated discoveries. 
Some of the present authors are also guilty of ignorance of prior 
work. We referred to the pioneering insights in Section 2.1.2. 



The "recent" period of algorithms started with the work of 
Miao, Sayre and Chapman (Miao et al. 1998) who pointed out 
that the fraction of the unit cell where the density is known is an 
important parameter for convergence. In somewhat later work, 
with oversampled structure factors calculated on a regular grid, 
the crystallographic program Eden successfully demonstrated 
the recovery of the electron density using a simulated data set 
from the Photoactive Yellow Protein (Szoke, 1999). The protein 
was put into a fictitious unit cell, twice the size of the original 
one, and a target with zero density was used outside the orig- 
inal unit cell of the protein. Similarly, Miao and Sayre (Miao 
et al., 2000) have studied empirically how much oversampling 
is required in two- and three-dimensional reconstructions of a 
simulated data set; using a version of the Gerchberg - Saxton 
- Fienup (GSF) algorithm. Among recent articles we mention 
Robinson et al (2001), Williams et al (2003), Marchesini et al. 
(2003) and references therein, in addition to those mentioned in 
the Introduction. 

All reconstruction algorithms of oversampled diffraction pat- 
terns use a priori information on the shape and size of the parti- 
cle. In our previous studies in crystals we found that such infor- 
mation is very valuable. For example, Eden converges surpris- 
ingly well for proteins at low resolution where the only infor- 
mation used is that the molecule is a single "blob". Eden also 
converges for synthetic problems with a good knowledge of the 
solvent volume, which is greater than 50% (Beran et al., 1995) 
or 60% (Eden). A similar conclusion was reached in Miao et 
al. (1998). In comparison, when a molecule is embedded in a 
3-fold larger fictitious unit cell, the empty "solvent" occupies 
^96% of the cell volume. 

As discussed previously, the reconstruction of scatterers from 
their diffraction pattern is a difficult mathematical problem. In 
many cases the indeterminacy of the absolute position of the 
object and of its handedness causes difficulties in convergence. 
That is definitely the case with Speden so, in that sense, SPE- 
DEN is not a good algorithm. Empirically, the GSF algorithm 
has a larger radius of convergence and deals better with stagna- 
tion (Marchesini et al. 2003) 

Another family of difficulties arises when there is a priori in- 
formation available, but there is only incomplete and noisy data. 
Under such conditions the main questions are how to find a so- 
lution that optimally takes into account the available informa- 
tion and that is the best "sensible" one that reproduces the noisy 
and incomplete data to its limited accuracy. It is this second set 
of conditions for which SPEDEN was written. Although, in this 
paper, we show only its performance for artificial and "easy" but 
incomplete data, Speden's older sister, Eden has been shown 
to have those properties on a large range of crystallographic data 
sets, ranging from Cu02 to the ribosome. We expect that such 
properties of Eden will be inherited by Speden, considering 
that their fundamental mathematical properties are sufficiently 
similar. 

The best known, and successful class of algorithms is the 
group of iterative transform algorithms that we refer to as 
Gerchberg-Saxton (GS) (Gerchberg et al., 1972), and its devel- 
opment, in which support constraints and feedback are added, 
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the GS-Fienup (GSF) or hybrid input-output algorithm (Fienup 
et al., 1982; Aldroubi et al., 2001; Bauschke et al., 2002; 
Bauschke at al., 2003). The essence of the GSF algorithms is 
that they iterate the N-pixel data between real and reciprocal 
spaces via FFTs and enforce the known constraints in each of 
these spaces. 

Depending on the degree of noise in the data, these algo- 
rithms usually converge in about a hundred to a thousand it- 
erations. The weak convergence (non divergence) of the GS 
algorithm has been proven in the absence of noise (Fienup et 
al., 1982). There is no mathematical proof that these algorithms 
will converge in general, but it is reasonable that by sequen- 
tially projecting onto the set that satisfies the real space con- 
straints and the set that satisfies the reciprocal space constraints, 
the intersection (corresponding to a valid solution) should be 
approached. This is definitely true for projections onto convex 
sets, but unfortunately these sets are not convex (Stark, 1987). 
In practice, despite the fact that the modulus constraint is non- 
convex, the algorithms often converge even in the presence of 
noise in about several hundred iterations. 

The main difference between the GSF algorithms and 
SPEDEN is that Speden does not iteratively project onto the 
sets of solutions that satisfy the real-space or reciprocal space 
constraints separately, but rather it minimizes a cost function 
that includes all the constraints of both spaces. It does this by 
varying quantities in real space only (the n(p)'s) and the cost 
evaluation only requires a forward transform from real space to 
reciprocal space. As the cost function never increases, Speden 
reaches only a local minimum. 

In spite of its "small" radius of convergence, there are some 
expected advantages to Speden's algorithm.. In Speden we 
compare the F ca i c to the F i, s by interpolating from the samples 
of F ca i c (h), calculated on a regular grid, to the measured sample 
vectors H. Since the gridded |F ca i c (^)| are a complete set (due to 
the fact they are sampled above the Nyquist frequency) the in- 
terpolation is stable and performed with little error. Since an in- 
verse transform is required in the GSF algorithms, the measured 
diffraction data \F i, s (H (recorded on Ewald spheres in re- 
ciprocal space) must be interpolated to the gridded data points 
h. The observed data might not be a complete set, especially at 
high resolution where the density of samples is sparser: this may 
lead to error. An additional possible difficulty with the GSF al- 
gorithms is that the effective number of unknowns may increase 
with the size of the fictitious unit cell, while in SPEDEN the ef- 
fective number of unknowns is constant. Finally, in SPEDEN, 
weightings can be properly applied to all data and knowledge. 
The measured data is inversely weighted by its uncertainty; it 
is a procedure equivalent to maximum likelihood methods and 
it should be optimal, at least in theory. As a side effect, when 
the constraints are inconsistent, SPEDEN still converges to a 
well-defined and correct solution. (Note that for noisy data the 
constraints are almost always inconsistent.) Weightings are also 
applied to reflect our confidence in real-space constraints, and 
these weightings are consistently used in real space. 

Tests 

Speden has certain built-in limitations. In particular, of 



course, reconstruction is only as good as the diffraction mea- 
surements and derived structure factor amplitudes are reliable. 
There are also other less obvious limitations. For example, there 
are inherent inaccuracies due to the Gaussian representation of 
the electron density in real space. (Szoke et al., 1997a) Also, 
the trilinear interpolation for representing integer (hkl) structure 
factor amplitudes on a non-integer grid is only approximate. Fi- 
nally, as a fundamental limitation of any reconstruction method, 
both the absolute position and the handedness of the molecule 
are undefined. 

We performed preliminary tests to verify the capabilities and 
limitations of our reconstruction method using computed and 
experimentally obtained diffraction patterns. In this section, we 
first describe the reconstruction of simple "molecules" from 
synthetic diffraction patterns with Speden. Specifically, we 
discuss how the convergence of Speden is affected by the er- 
rors due to interpolation in reciprocal space, by the quantity of 
observed structure factors, |F bs|, by the extent of the "known" 
starting model, and by the uniformity of sampling in recipro- 
cal space. We then describe the reconstruction of simple two- 
dimensional objects from synthetic and measured diffraction 
patterns with Speden. 

2.4. Generation of Synthetic Test Data 

We created synthetic "molecules" in the format of the Pro- 
tein Data Bank (pdb) files (Berman et al., 2000). Each molecule 
was composed of 15 point-like carbon atoms, placed at random 
positions within a cube measuring 16.8 A in each dimension 
and "measured" to 4 A resolution; these values correspond to 
crystallographic B factors of 185.7 A 2 . The molecule was then 
shifted so that its center-of-mass was at the center of the cube. 
All our simulations were repeated using molecules with several 
different random arrangements. Initially, the "unit cell" coin- 
cided with the dimensions of the cube in which the atoms were 
placed; later, larger cells were used and the atoms were posi- 
tioned in their center. We also generated "starting models" by 
removing atoms from the full "molecule". 

Both full and partial molecules served to generate structure 
factors (F ca i c ), using the atomic positions and the B factors. 
Starting models were generated from the F ca i c , using Speden's 
preprocessor, Back, which finds the optimal real-space repre- 
sentation for an input F ca \ c . Initially, sets of "measurements" 
(| ^obs |) were generated by deleting the phases of the calculated 
structure factors of the full or partial molecule. The F Q b s files so 
generated had all integer H(i). The uncertainty of the measured 
structure factors, a(h), were chosen to be 

a (h) = ay/ {F ohs (h')) h , 110 + F obs (h), (10) 

with a = 0.1 for h 7^ (0,0,0), and a = 0.01 fork = (0,0,0). 
We used two alternate methods to generate F D b s files for non- 
integer H(i): In the first method, we used a unit cell whose di- 
mensions were incommensurate with one another and with the 
edge of the cube, but whose volume equaled that of the cube. 
The resulting F b s file, generated again from F ca i c files by delet- 
ing the phases, then had its indices scaled back appropriately, 
yielding fractional H(i). A different second method was used to 
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sample the reciprocal H(i) space non-uniformly: \F \, S \ at frac- 
tional H(i) were calculated using tri-linear interpolation from 
|F bs | data on a regular grid that, in turn, was at four times the 
regular resolution in reciprocal space. 

For each of these F D b s files, some constraining information is 
required in order to find the atom positions. We used two types 
of constraints: one of them identified the (approximate) empty 
region; the other one used F ca i c at a considerably lower reso- 
lution. We call them the empty target and the low-resolution 
target, respectively. Both are based on the assumption that at 
a considerably lower resolution, the general position of atoms 
as one or more "blobs" in empty space is known. The low- 
resolution FS cn \ c was prepared by smearing the full F c . d i c file 
to 10 A. The empty target identified the empty points in terms 
of a mask. Then, throughout Speden's iteration process, using 
the solver SOLVE, the program attempted to match the current 
electron/voxel values in masked-in regions to empty (n(p) = 0) 
values. The phase-extension target used the same FS c - d i c file; 
during the iteration process, at each step, the current real-space 
solution was smeared to that low resolution in reciprocal space 
and restrained to agree with that target. 

2.5. Assessment of Quality of Reconstruction 

Besides inspecting the reconstructed electron density visu- 
ally, we used four quantitative measures to compare the recon- 
structed image with the electron density from the full molecule 
at 4 A: 

a. Real-space RMS error: We calculated the real-space elec- 
tron density from the electron/voxel files, a process we 
call regridding. We then calculated the root-mean-square 
(RMS) error of the electron densities, p x and p2, defined 

as 

RMS=-±-J= =, (11) 

We permitted one file to be shifted with respect to the 
other file, in order to minimize the distance. 

b. Phase difference: We calculated the average (amplitude- 
weighted) phase difference between the F c . d \ c at the end 
of the run and the corresponding F cd i c generated from the 
full molecule. 

c. Final R factor: We calculated the crystallographic R fac- 
tor (Giacovazzo et al., 2002) at the end of the run. 

d. Count error: We compared the integrated real-space elec- 
tron density generated from a run result with the true 
number of electron as identified in the pdb file, on 
an atom-by-atom basis. The integration was performed 
around each atom over a sphere with a radius that was 1 .5 
times the grid spacing. The figure-of-merit is the RMS er- 
ror. 

Of all these measures, the final R factor was the least useful 
to assess the quality of the reconstructed image. The R factor 



tends to be lower for a small number of entries in the obser- 
vation file since there are fewer equations to satisfy during the 
reconstruction. In such a case, a visual inspection shows that 
the reconstructed electron density may have little resemblance 
to the 15 carbon atoms. However, there was a good correlation 
among the other three measures. The solutions looked correct 
when the phase difference between solution and full F ca i c was 
less than 20 ° , the count error between solution and pdb model 
was 0.2 or lower (out of 6), and the RMS distance measure was 
less than 0.2. 

2.6. Overcoming Inaccuracies due to Tri-Linear Interpolation 
Through Oversampling 

For computational purposes, we placed the (unknown) 
molecule or particle into a fictitious unit cell that is larger than 
the molecule, and calculated the structure factors on an integer 
grid in the large unit cell, oversampled by the same ratio: the 
size of the large cell to the size of the molecule. We then cal- 
culated the structure factor at fractional H from the structure 
factor at integer h using tri-linear interpolation. The interpola- 
tion error becomes smaller when larger unit cells are used (at 
the expense of computation time). We studied the effect of the 
unit cell size on convergence of Speden. 

In the initial tests, the unit cell was the same size as the orig- 
inal molecule, and the starting (known) part of the molecule 
consisted of the full molecule. When the molecule consisted of 
atoms on grid points and the F^ files had integer //(;), unsur- 
prisingly Speden converged, as did Eden on the same data set. 
However, we found that SPEDEN did not converge to a unique 
solution if either the atoms were not on grid points or when the 
Fobs file had non-integer //(;'), or both, since the solution mean- 
dered in real space. 

In subsequent tests, we generated larger unit cells and applied 
a target over the empty part of the unit cell, in an attempt to re- 
strain the meandering problem. We embedded the molecule in a 
cell that was 2 or 3 times greater in each dimension. An empty 
target was used that essentially covered the empty 7/8-th or 
26/27-th of the unit cell, respectively. We applied a high relative 
weight for this empty target, and we still used the full molecule 
as a starting position. We found that both the larger cell and the 
empty target are of critical value in enabling SPEDEN to con- 
verge to the correct solution. Comparing the 2-fold larger unit 
cell versus the 3-fold unit cell, there was a great improvement 
in the latter case. These results show that for tri-linear interpo- 
lation, it is adequate to use a unit cell that is 3 times greater in 
each dimension. We expect that more sophisticated interpola- 
tion algorithms should allow using smaller unit cells. 

2.7. Dependence of Reconstruction on the Quantity of Input 
Data 

A three-fold enlarged unit cell increases the number of un- 
known amplitudes of the Gaussians, n(p), by a factor of 27. In 
principle, the emptiness of the volume around the molecule re- 
strains the effective number of independent unknowns. Never- 
theless, if the number of equations, which is given by the num- 
ber of entries in the Fobs file, is not increased, it is easy for 
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the solver to "hide" electrons among the large number of un- 
knowns in the system, even when the empty target constraint is 
used. In fact we found that when we compared the final Fcalc 
from Solve against the starting Fcalc, on the one hand, and the 
correct Fcalc on the other, Solve's final Fcalc was closer to the 
starting one than to the correct one. In other words, the cost 
function in reciprocal space was not a sufficiently strong con- 
straint in SPEDEN's algorithm, for this synthetic problem. In 
a similar real case, more experimental diffraction patterns need 
to be collected in order that SPEDEN would be able to find the 
corresponding image without other information. 

2.8. Recovery of Missing Atoms 

In the next set of synthetic tests, we attempted to recover 
missing information by starting from a partial model that con- 
tained less than the full complement of 15 atoms. In these sim- 
ulations, we used F b s files with non-integer //(;'), a three-fold 
enlarged unit cell, an empty target or a phase extension target, 
and randomly positioned atoms. 

We found that a low-resolution spatial target significantly 
helps SPEDEN to converge. Figure 1 (a) shows the results of the 
comparison of the reconstructed image with the electron den- 
sity from the full pdb file when a phase extension target is used. 
The phase extension target was calculated at a resolution of 10 
A. We found that it was generally possible to recover 5, 10, or 
even all 15 of the atoms. Please note that the amount of infor- 
mation in such a phase extension target is (4A/10A) 3 « 6% 
of the information in the perfect solution. 

It was more difficult to reconstruct the original electron den- 
sity when we used an empty target, as shown in Figure 1 (b). 
SPEDEN was able to recover 5 out of the 15 atoms, but did 
not converge when 10 atoms were unknown. Perhaps surpris- 
ingly, the case where there was no starting model at all (0 atoms 
known) did consistently better than those cases where a partial 
model was given as a starting condition. 

2.9. Effect of Non-Uniform Sampling on Recovery 

In this set of synthetic tests, we addressed the question of how 
difficult it is to recover the molecule from a non-uniform set of 
samples in reciprocal space, similar to real data sets, and how 
the results compare with the reconstruction from a uniformly- 
sampled data set. 

We generated two-dimensional diffraction patterns of the 
synthetic carbon molecule for different particle orientations, 
corresponding to recorded diffraction patterns in a "real" exper- 
iment. The two-dimensional diffraction patterns were linearly 
interpolated from a three-dimensional diffraction pattern; the 
latter was calculated on an additionally double-fine grid over 
the already triple-sized unit cells, i.e. using a unit cell that was a 
total of six times larger than the molecule in each direction. Fur- 
ther refining the grid of the three-dimensional diffraction pat- 
tern did not alter the results significantly. The three-dimensional 
diffraction pattern, in turn, was the Fourier transform of the 
gridded electron density of the synthetic molecule. 

Two-dimensional patterns do not sample the diffraction space 
uniformly. The sampling density near the center of the diffrac- 



tion space is much larger than the sampling density further 
away. We used a completeness measure to characterize the sam- 
pling uniformity. The reciprocal space is divided into cells that 
are 4ir/a by 4ir/b by A-kIc in size, where a, b, and c are the 
molecule sizes in each dimension. The completeness then is 
the ratio of cells in reciprocal space that contain at least one 
measurement over the total number of cells. Figure 2 shows the 
completeness of the input observation files as a function of the 
number of diffraction patterns. Also shown in Figure 2 is the 
number of calculated diffraction intensities. 

We then used SPEDEN to recover 7 out of the 15 atoms. The 
molecule was embedded in a unit cell that was three times larger 
in each dimension, and we used an empty solvent target. Fig- 
ure 3 shows the errors of the reconstructed electron density as 
a function of the number of two-dimensional diffraction pat- 
terns. The orientations of the diffraction patterns were chosen 
at random, and the calculations were repeated for four differ- 
ent molecules. For comparison, also shown in Figure 3 are the 
errors of the electron density of the 8 known atoms ("partial 
model"), and the errors of the reconstructed electron densities 
when a three-dimensional diffraction pattern is used which was 
oversampled three times ("integer hkl") or six times ("fractional 
hkl"). 

As discussed above, the R factor is not a useful measure to 
assess the quality of the reconstructed image, but the RMS and 
count errors are better measures for the reconstruction qual- 
ity. Surprisingly, we found that four two-dimensional patterns 
are sufficient to reconstruct the electron density as well as in 
the case the full three-dimensional diffraction pattern is given. 
Four two-dimensional patterns have a remarkably low sampling 
completeness of only 14%. Further increasing the complete- 
ness or the number of observations does not improve the qual- 
ity of the reconstruction. We would like to note, however, that 
these results could be dependent on the choice of the test model, 
and that for different test models the number of required two- 
dimensional patterns may be larger. 

2.10. Recovery of Two-Dimensional Data 

In the final set of tests, we demonstrate SPEDEN's ability to 
recover missing information for a two-dimensional configura- 
tion of 37 Au balls in a plane. We reconstructed the Au balls us- 
ing (i) a synthetic diffraction pattern and (ii) an experimentally 
obtained diffraction pattern as discussed by He et al., 2003. In 
the following we will discuss both cases, starting with the syn- 
thetic diffraction data. 

The 37 Au balls are arranged in a plane as shown in Figure 4. 
The arrangement of the balls is similar to the experimental case 
discussed by He et al., 2003. The Au balls were 50 nm in diam- 
eter. We generated an artificial set of "measurements" (|F bs|) 
by calculating the structure factors (F ca i c ) to 30 nm resolution 
and deleting the phases. The uncertainty of the measured struc- 
ture factors, a(h), were chosen according to Equation (10). We 
generated an initial model by smearing the full F ca i c file to 90 
nm, and running BACK on it. The initial model is shown in Fig- 
ure 5. In Figures 5 -7, we only show one plane. We also used 
this smeared F ca i c to generate a low-resolution spatial target as 
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well as an empty target outside the molecule. The correspond- 
ing weight function is shown in Figure 6. 

We then used Speden to reconstruct the Au balls. As shown 
in Figure 7, Speden reconstructed the electron density success- 
fully. However, we further found that if we use an empty starting 
model, Speden has difficulties converging to the correct elec- 
tron density. There are two reasons for this behavior. First, with- 
out an initial model, the symmetry of the system is not broken, 
and Speden stagnates since the support does not distinguish 
between the object and its centrosymmetric copy. Second, the 
mask and the reconstructed electron density are possibly shifted 
with respect to each other. If the initial model is empty, the posi- 
tion of the reconstructed electron density is mostly determined 
in the early iteration of the Solve algorithm and can be par- 
tially cut off by the solvent. The algorithm has difficulty shifting 
the result. It is necessary to provide information about the loca- 
tion of the electron density to a certain degree, for example in 
the form of a smeared model. Note that the GSF algorithms are 
designed to overcome these problems when there is abundant 
and accurate data available. 

We will now discuss the reconstruction of the Au balls using 
experimental data. To generate a starting model, we took the 
experimental |F t, s | data along with the phases obtained by He 
at al., 2003 using a version of the GSF algorithm, and smeared 
this data to 80 nm. The starting model is shown in Figure 8. 
We used the same data to generate a real space target with a 
target fraction of 99.7%, shown in Figure 9. Similar to the case 
of the synthetic test data, we chose er(h) according to Equation 
(10). We then used SPEDEN to reconstruct the Au balls. Fig- 
ure 10 (a) shows the reconstructed electron density, and Figure 
10 (b) shows a scanning electron microscope (SEM) picture of 
the sample. We found that SPEDEN reconstructed the electron 
density from the experimental data successfully. 

3. Summary and Conclusions 

In this paper we presented SPEDEN, a method to reconstruct 
the electron density of single particles from their x-ray diffrac- 
tion patterns, using an adaptation of the Holographic Method in 
crystallography. Unlike existing GSF algorithms, SPEDEN min- 
imizes a cost function that includes all the constraints of both 
real space and reciprocal space, by varying quantities in real 
space only, so that the cost evaluation requires only a forward 
transform from real space to reciprocal space. SPEDEN finds a 
local minimum of the cost function using the conjugate gradient 
algorithm. We implemented SPEDEN as a computer program, 
and tested it on synthetic and experimental data. Our initial re- 
sults indicate that SPEDEN works well. 
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Figure 1 

(a) Results of the comparison of the reconstructed image with the electron den- 
sity from the full pdb file in the cases (a) a phase extension target and (b) low- 
resolution target are used. The phase error is given in units of degrees. 
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Zuo, J., Vartanyants, I., Gao, M., Zhang, M., Nagahara, L., 2003. 
Atomic resolution imaging of a carbon nanotube from diffraction 
intensities. Science 300, p. 1419. 



Figure 2 

The completeness and length of the input observation files used for the calcu- 
lations shown in Figure 3. 



research papers 





Figure 3 

The error in the reconstructed electron density, as a function of the number 
of two-dimensional diffraction patterns. The orientation of the diffraction pat- 
terns was chosen at random, and the results were repeated for four different 
molecules. Also shown are the error of the electron density of the 8 known 
atoms ("partial model"), and the error of the reconstructed electron densi- 
ties for a three-times ("integer hkl") and six-times ("fractional hkl") in-each- 
dimension-oversampled three-dimensional diffraction pattern. 



Figure 5 

Initial starting model used for two-dimensional reconstruction from synthetic 
data. 
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Figure 4 

Planar arrangement of 37 Au balls for two-dimensional reconstruction. 



Figure 6 

Weight function used for two-dimensional reconstruction from synthetic data. 
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Figure 7 

Reconstructed electron density using synthetic data. 
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Figure 8 

Initial starting model for two-dimensional reconstruction from experimental 
data. 




Figure 9 

Weight function used for two-dimensional reconstruction from experimental 
data. 
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Figure 10 

(a) Reconstructed electron density using experimental data, (b) SEM Image of 
Au balls. 



